Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synapse #708

Closed
wants to merge 22 commits into from
Closed

Synapse #708

wants to merge 22 commits into from

Conversation

eryanRM
Copy link

@eryanRM eryanRM commented Oct 24, 2023

Description

Initial PR bringing Synapse functions into subclass

Note: new errors (ex. 'str' mapping) popped up when running updated tests

ex 1. resolved: 'str' mapping errors in several test_job_client.py
ex 2. tests/load/test_job_client.py::test_load_with_all_types now failing due to column count not matching table

Related Issues

  • Fixes #...
  • Closes #...
  • Resolves #...

Additional Context

This PR is for David's review to confirm we are on correct path

@netlify
Copy link

netlify bot commented Oct 24, 2023

Deploy Preview for dlt-hub-docs failed.

Name Link
🔨 Latest commit 2adc3b2
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/65401c75a4f170000873e67b



@contextmanager
def begin_transaction(self) -> Iterator[DBTransaction]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly happens when we use the same tx code we have in the mssql implementation? why can't we use that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shifted back to mssql implementation. Current approach helped with error handling; however, we aren't facing transaction errors anymore and can switch back.

original_sql = sql

# Check if it's a multi-row insert
is_multi_row_insert = (
Copy link
Collaborator

@sh-rp sh-rp Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should implement a custom InsertValuesLoadJob that does these conversions, and not hack the execute_sql.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback, @sh-rp. How about a subclass of InsertValuesLoadJob tailored for Microsoft Synapse that lives in synapse.py?

Given that DLT handles multiple destinations and insert_job_client.py provides a foundation for all of them, I can isolate Synapse-specific conversions and avoid execute_sql

SynapseInsertValuesLoadJob

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sh-rp circling back to this, please let me know if this is in right direction. generate_insert_query() now lives in synapse.py

Instead of taking a SQL string (vulnerable to injection) it now takes comparable params to InsertValuesLoadJob

Copy link
Collaborator

@sh-rp sh-rp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @eryanRM, this PR is looking much better now. I have some requests:

  • Please Make sure the branch is up to date with devel, there are some small changes in the way the loader resolved which tables are needed on staging etc. You can probably just use the settings from the mssql client for that
  • Please make sure the tests are actually executed so I can see what passes and what does not. This includes removing debug/test code and making sure the poetry synapse extra works.
  • Generally speaking it seems like there is a lot of code shared between mssql and synapse. I would be better to have common baseclasses for the two of them instead of having the same code in separate places.

@eryanRM
Copy link
Author

eryanRM commented Oct 27, 2023

Hi @sh-rp, I appreciate the feedback. This has been an illuminating process

  • Updated with Devel. Please let me know if it's still meaningfully behind
  • I removed debug and print statements. Can you see tests on your end? Locally, I dropped to 40 passing tests after syncing with devel and migrating / modifying generate_insert_query to work in subclass. I am confident I can get back to passing (while keeping as much common with mssql as possible)
  • Will work on baseclass through weekend and consult with @steinitzu . Priority has been getting back to passing vast majority of tests but with these improvements implemented (less hacky)

@eryanRM
Copy link
Author

eryanRM commented Oct 31, 2023

Refreshing Synapse branch w/ 'devel'. Continuing as new PR

@eryanRM eryanRM closed this Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants